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5 BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention pertains generally to protocols for network traffic routing, and more 
particularly to a loop-free multipath routing protocol based on distance vectors. 

2. Description of the Background Art 

Routing protocols using the "Distributed Bellman-Ford" (DBF) algorithm exhibit 
y excessively long convergence process toward correct routes when subjected to link cost 

=J5 increases. A more serious deficiency of the DBF algorithm is that it is unable to 

M 

: converge when a set of link failures result in a network partition, which is commonly 
m referred to as the count-to-infinity problem. Moreover, typical routing protocols utilized 
1 for the IP Internet provide a single next-hop choice for packet forwarding. The use of 
single-hop choices is inadequate for traffic load balancing, while it allows temporary 
routing loops to form during times of network transition, which diminishes network 
performance. 

Routing may be described as the problem of determining a set of successor 
20 choices (i.e., next-hop) at each node and for each destination in the network to be used 
for packet forwarding. In creating a formal definition, allow a computer network to be 
represented as a graph G =(N ,L), where N is the set of nodes (routers) and L is the 
set of edges (links). The set of neighbors of node / is to be given by N*. The problem 
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consists of finding the successor set at each router i for each destination j , denoted by 

Sj c N* , so that when router i receives a packet for destination j , it can forward the 

packet to one of the neighbor routers in the successor set Sj . By repeating this process 

at every router, the packet is expected to reach the destination. If the routing graph SGj 

5 is a directed subgraph of G , as defined by the link set {(m , n )\ n e SJ , m e N}, a 

packet destined for j follows a path in SGj . Two criteria determine the efficiency of the 

routing graph constructed by the protocol: loop-freedom and connectivity. It is required 
M- that SGj be free of loops, at least when the network is stable, because routing loops 

iff degrade network performance. In a dynamic environment, a stricter requirement is that 
1(S SG f be loop-free at every instant, such as if S*, and SG, are parameterized by time t, 

T then SGj (t) should be free of loops at any time t . If there is at most one element in 

% each Sj then SGj is a tree and there is only one path from any node to node j . On the 

yo 

0 other hand, if 5} has more than one element, then SGj is a directed acyclic graph 

(DAG) with greater connectivity than a simple tree, and can be utilized to enable traffic 
15 load balancing. 

The importance of using a successor set instead of a single successor per 
destination and the need for instantaneous loop-freedom of SGj has been 

demonstrated in recent work, in which a load-balancing routing framework is described 
which obtains "near-optimal" delays. A required key component of this framework is a 
20 routing protocol which responds quickly in determining multiple successor choices for 
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packet forwarding, such that the routing graphs implied by the routing tables are free of 
loops even during network transitions. By load-balancing traffic over the multiple next- 
hop choices, congestion and delays are significantly reduced. 

A number of limitations exist in the use of current Internet routing protocols. The 

5 widely deployed routing protocol RIP provides only a single next-hop choice for each 
destination and does not prevent temporary loops from forming. A protocol from 
Cisco™ referred to as EIGRP ensures loop-freedom but can guarantee only a single 
loop-free path to each destination at any given router. The link-state protocol known as 

y, OSPF offers a router multiple choices for packet-forwarding only when those choices 

1 33 offer the minimum distance. When fine granularity exists in the link cost metric, perhaps 

RJ 

W for the sake of accuracy, it is less likely that multiple paths with equal distance exist 

;C between each source-destination pair, which translates to not using the full connectivity 

14 of the network for load balancing. Also, OSPF and other similar algorithms which are 

pi 

fjy based on topology-broadcast incur excessive communication overhead, often forcing 
1 Jp network administrators to partition the network into areas connected by a backbone. 
This makes OSPF complex in terms of the required router configurations. 

Several routing algorithms based on distance vectors have been proposed within 
the industry. However, with the exception of DASM (Zaumen, W. T. and Garcia-Luna- 
Aceves, "Loop-Free Multipath Routing Using Generalized Diffusing Computations", 
20 Proc. IEEE INFOCOM, March 1998) which provides multiple loop-free paths per 
destination, all of the proposed solutions are single-path algorithms. In addition, a 
number of distributed routing algorithms have been proposed that use the distance and 
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second-to-iast hop to destinations as the routing information exchanged among nodes. 
These algorithms are often called path-finding algorithms or source-tracing algorithms. 
One of these path finding algorithms, referred to as LPA appears to provide greater 
efficiency than any of the routing algorithms based on link-state information proposed to 
5 date while it provides loop-freedom at every instant. Again, however, it should be 

appreciated that LPA along with the other current source-tracing algorithms provide only 
a single path per destination. A couple of routing algorithms have been proposed that 
use partial topology information, such as LVA, and ALP, to eliminate the main limitation 
Q of topology-broadcast algorithms. These routing algorithms, however, do not provide 
1 SO loop-freedom at every instant. 
11 Recently, MPDA has been introduced, which appears to be the first routing 

algorithm based on link state information that provides multiple paths to each 
h destination that are loop-free at every instant. Another algorithm referred to as MPATH, 
has been introduced which appears to be the first path-finding algorithm that constructs 
1§* loop-free multipaths. Currently MPDA, MPATH, and DASM appear to offer the only 
practical loop-free muitipath routing algorithms which are suitable for implementation 
within a near-optimal routing framework. 

Therefore, a need exists for a routing protocol that allows the construction of 
loop-free multipaths, even during network transitions, while still providing collision-free 
20 communication as outlined above. The present invention satisfies those needs, as well 
as others, and overcomes the deficiencies of previously developed routing protocols. 
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BRIEF SUMMARY OF THE INVENTION 

The present invention comprises a distance vector routing methodology referred 
to as a "Multipath Distance Vector Algorithm" (MDVA) that computes the shortest 
multipath loop-free routes between each source and destination pair. In MDVA, only 
distance values are exchanged among neighboring routers. 

By way of example, and not of limitation, in MDVA, link distances D) are 

computed, such as by using a distributed Bellman-Ford algorithm (DBF) to generate a 
routing graph SG r The nodes exchange messages containing distance and status 

information to maintain a routing table at each node. If the distance increases for a link, 
or the status changes, then a diffusing computation is executed which prevents 
counting-to-infinity problems. Shortest path routes are selected according to loop-free 
invariant (LFI) conditions. The present invention solves a number of shortcomings 
found within current distance-vector algorithms. 

An object of the invention is to provide a routing protocol for creating minimum 
length multipath routes within a network. 

Another object of the invention is to provide a routing protocol for establishing 
multipath routes based on distance vectors. 

Another object of the invention is to provide a method of selecting multipath 
routing which is not subject to loops. 

Another object of the invention is to provide a method of selecting multipath 
routing which is not subject to counting-to-infinity problems. 
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Another object of the invention is to provide a routing protocol wherein the routing 
selections are distributed across the nodes in the given network. 

Another object of the invention is to provide a multipath routing algorithm which 
utilizes diffusing computations to enhance performance. 

Further objects and advantages of the invention will be brought out in the 
following portions of the specification, wherein the detailed description is for the purpose 
of fully disclosing preferred embodiments of the invention without placing limitations 
thereon. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The invention will be more fully understood by reference to the following 

drawings which are for illustrative purposes only: 

FIG. 1 is a flowchart of the routing method according to an aspect of the present 

invention. 

FIG. 2 is pseudocode for computing distance-vectors according to an aspect of 
the present invention, shown for processing both passive and active node states. 

FIG. 3 is a topology diagram of the CAIRN network topology as utilized in 
simulations of the present invention. 

FIG. 4 is a topology diagram of the MCI network topology as utilized in 
simulations of the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 

For illustrative purposes the present invention will be described with reference to 
FIG. 1 through FIG. 4. It will be appreciated that the apparatus may vary as to 
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configuration and as to details of the parts, and that the method may vary as to the 
specific steps and sequence, without departing from the basic concepts as disclosed 
herein. 

The present invention provides a distance vector algorithm which is referred to 
5 herein as "Multipath Distance Vector Algorithm" (MDVA) for loop-free multipath 
construction. 

1. Multipath Distance-Vector Algorithm (MDVA) 
1.1. Solution Strategy 

Given that a number of potential directed acyclic graphs (DAGs) exist for a given 
1(0 destination within a graph, it is problematic to determine which DAG should be utilized 
W as a routing graph. The routing graph should be uniquely defined and it should also be 
t easily computable by the use of a distributed algorithm. A natural choice is the use of 
L the routing graph which is defined by the shortest paths. Accordingly, MDVA defines 
fu Sj(t) ={k\D*(t)< Dj(t), k e N*}, where D) is the cost of the shortest path from 
1|J node / to node j as measured by the sum of the link-costs along the path. The routing 
graph SGj implied by this set is unique and is referred to as the shortest multipath. In 
computing Dj, distributed routing algorithms may exchange any information, such as 
distance-vectors or link-states, although it must be assured that D) will converge to the 

correct distances. The following formally defines what is meant as convergence. 
20 Letting G{t) denote the topology of the network as seen by an "omniscient observer" at 

time t, wherein D l At) denotes the distance from node i to node j in G(t) , and 
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assuming that the network has a stable configuration up to a given time t. It should be 
noted that all quantities within G are depicted in a larger font. It can be said that the 
network has converged to the correct values at t if Z)j (t) = D l j (t) for all i and j . If a 

sequence of link cost changes were to occur between time t and t c , with none 
occurring subsequent to t c , then the routing algorithm is said to converge if at some 

time t c < t f < oo, Dj(t f } = D l .(t f ) = Dj(t c ). In addition, during the convergence 

phase, the algorithm must ensure that the graph SGj is loop-free at every instant. 

According to the distributed Bellman-Ford (DBF) algorithm, each node i 
repeatedly executes the equation Z)j = min{D l jk +4 \ keN^ for a given destination j 

and upon each D { } change it reports the new distance to its neighbors. A known 

property of DBF is the rapid rate of convergence that occurs when link costs decrease. 
However, convergence is not assured in the case of increasing link-costs, and when link 
failures result in network partitions the DBF algorithm may never converge. The lack of 
convergence in this instance is known in the industry as the "counting-to-infinity 
problem". Intuitively, the counting-to-infinity problem arises as a result of "circular" logic 
within the distance computations, wherein a node computes its distance to a destination 
using a distance communicated by a neighbor, which is provided as a path-length 
running through the node itself. The node utilizing this distance information is unaware 
of the circular logic because the nodes exchange distance information and not path 
information. 
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The circular computation of distances that occur in DBF can be prevented if 
distance information is propagated along a DAG rooted at a destination. Given a DAG, 
each node computes its distance using distances reported by the "downstream" nodes 
and reports its distance to "upstream" nodes. This method, referred to as diffusing 
computations was first suggested by Dijkstra et. al. to ensure termination of distributed 
computation. It will be appreciated that a diffusion computation always terminates due 
to the acyclic ordering of the nodes. The base algorithm for EIGRP is DUAL which 
utilizes diffusing computation to solve the counting-to-infinity problem. In addition to 
DUAL, a number of other distance vector algorithms have been proposed which employ 
diffusing computations to overcome the counting-to-infinity problem of DBF. The 
algorithm suggested by Jaffe and Moss allows nodes to participate in multiple diffusing 
computations for the same destination and requires use of unbounded counters, which 
render the method impractical. In contrast, a node in DUAL and DASM participates in 
only one diffusing computation for any destination at any single time and thus requires 
only the use of a toggle bit. The present invention, MDVA follows the second approach. 

Two issues arise regarding diffusing computation: (1) since many potential DAGs 
exist for a given destination, the selection of which one to use for the diffusing 
computation is difficult; (2) how to implement diffusing computations in a dynamic 
environment in which the chosen DAG changes with respect time. 

The following describes resolutions for these issues. Resolving the first issue is 
straightforward as the shortest multipath SGj provides a correct choice given that 

computing SGj is the final objective. The resolution, however, of the second issue is 
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not so trivial. A routing graph SGj utilized for carrying out a diffusing computation can 
be allowed to change if the following conditions are met: (1) SGj is acyclic at every 
instant and (2) at any given instant, if a node reports a distance through a neighbor k 
in Sj it must ensure that k remains in Sj until the end of the diffusing computation. The 

5 prevention of a circular computation of distances can be inferred from the following 
argument. Assume first that a circular computation occurs at time t involving nodes z 0 , 

z'j , i 2 , ...i M . Let a node i p , wherein 1 < p < m , compute its distance at t p < t using 
distance reported by i x , and i 0 computes its distance using the distance reported by 

Ms 

0 at • Because i p _ x is held in the successor set of i p for 1 < < m and /' 0 holds z m 

ry 

1 0^ until the diffusing computation ends, therefore it follows that: 

1 i Q e Sjfa) i 0 e S}(t) 

g e ^(r 2 ) => L e Sf (t) 

Because SGj (t) , as implied by Sj (t) , is acyclic at every instant t , the above 

relations would indicate a contradiction. Thus, the circular computation is impossible 
when observing the above mentioned conditions. It should be noted that the distances 
are to be propagated along the shortest-multipath SGj which is computed using the 

20 distances itself. This "bootstrap" approach is the core of the MDVA algorithm, which 
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involves computing D) using diffusing computations along SG j while simultaneously 

constructing and maintaining routing graph SG } . 

In order to ensure that SG } is always loop-free a new variable feasible distance 

FD) is introduced. The feasible distance FD) is an "estimate" of the distance D) in the 
5 sense that FD) is equal to D) when the network is in stable state. However, in order to 

prevent loops during periods of network transitions, the value of FD) is allowed to differ 

temporarily from D) . Let D) k be the distance of k to j as notified to i by k . To 
jj ensure loop-freedom at every instant FD) , D) k , and Sj must satisfy the "Loop-Free 

fD Invariant" (LFI) conditions which were first introduced in regard to approximating 

10p minimum delay routing. The LFI conditions capture all previous loop-free conditions in a 

® unified form that simplifies protocol design and correctness proofs, comprising: 

| FD)(t) < D^(t) k € N* (1) 

I S){t) = {k\D) k {t)<FD)(t)} (2) 

The invariant conditions (1) and (2) state that, for each destination j , a node / can 
1 5 choose a successor whose distance to j , as known to i , is less than the distance of 
node i to j that is known to its neighbors. 

Theorem 1 : If the LFI conditions are satisfied at any time t , the SG. (t) implied by the 

successor sets S) (t) are loop free. 

Proof: Let k e S)(t) then from (2): 
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D) k {t) < FDj (t) (3) 
At node k , in view of node i being a neighbor and from (1 ) we arrive at FD] (t) < 
D' jk (t) , which when combined with Eq. 3 yields: 

FD)(t) < FDj (t) (4) 

It will be appreciated that Eq. 4 states that if & is a successor of node i in a path 
to destination j , then the feasible distance to j which is known to k is strictly less than 
the feasible distance of node i to j . Now, if the successor sets define a loop at time t 
with respect to j , then for some node p on the loop, we arrive at the absurd relation 
FDJ (t) < FDJ (t) . Therefore, the LFI conditions have been shown to be sufficient to 
1 G% assu re loop-freed om . 
J The above theorem suggests that any distributed routing protocol, such as link- 

y, state or distance-vector, which attempts to determine loop-free shortest multipaths is 
J required to compute D), FD), and S) such that the LFI conditions are satisfied, and 

M= such that at convergence D) = FD) = minimum distance from i to j . 
15 1.2. Algorithm Description 

FIG. 1 depicts the general flow for the method of the present invention. Link 
distances D) are computed at block 10 to generate a routing graph SG ; . The nodes in 

the network exchange distance and status information as per block 12. If a distance 
increase is detected at block 14 then a diffusing computation is performed as shown in 
20 block 16. The distance and status information is used to maintain routing tables within 
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each node as per block 18 so that the proper selection of a loop-free route is 
determined according to loop-free invariant conditions as shown in block 20. 

The MDVA algorithm utilizes DBF to compute distance D) , and thus routing 
graph SGj while always propagating distances along the routing graph SGj to prevent 
counting-to-infinity problems and to otherwise ensure termination. Each node maintains 
a main table containing D) as the distance of node i to destination j . The table also 

stores for each destination j , the successor set £j , the feasible distance FD) , the 
reported distance RD) , and the shortest distance possible through the successor set 5} 
as best distance SDj . In addition, the table stores QS) c S) , as the set of neighbors 
involved in a diffusing computation. Each node maintains a neighbor table for each 
neighbor k which contains D) k as the distance of neighboring node k to node j as 

communicated by node k . A link table stores the link-cost l[ of adjacent links to each 

neighbor k. If a link is down its link-cost is considered to increase to infinity and the 
distance to unreachable nodes is also considered to be infinity. 

Nodes executing the MDVA algorithm exchange information using messages 
containing at least one entry of the form [type, j,d], where d is the distance of the 
node sending the message to destination j . The type field comprises messages such 
as QUERY, UPDATE, REPLY, or equivalents. It is assumed that messages transmitted 
over an operational link are received without errors and in the proper sequence, and 
that the messages are processed in the order received. 

Nodes invoke the procedure ProcessDistVect as shown in FIG. 2 to process a 
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distances vector when an event occurs. An event may be considered as the arrival of a 
message, a change in the cost of an adjacent link, or a change in status (up/down) of an 
adjacent link. When an adjacent link is brought up, the node sends an update message 
/UPDATE, j , RD)] for each destination j over the link. When an adjacent link (i, m) 

5 fails, the neighbor table associated with neighbor m is cleared and the cost of the link is 
set to infinity. Then for each destination, the procedure ProcessDistVect (UPDATE, m , 
oo, j) is invoked. Similarly, when an adjacent link cost to m changes, the cost V m , is 
set to the new cost and ProcessD/sf Vecf (UPDATE, m , D) m , j) is invoked for each 
O destination j . When a message is received, ProcessDistVectQ is invoked for each 

i 

1 CW entry of the message, 
jg A node initializes the distance values in its tables to infinity and its sets to null at 

y. the startup time. In view of the fact that the distances can be computed independently 
HI to each destination, the remainder of the description describes the operation of the 
Q algorithm with respect to a particular destination j . A node can be in ACTIVE or 
15 PASSIVE state with respect to a destination j represented by a variable state. A node 
is considered active when it is engaged in a diffusing computation. Assume first that all 
nodes are PASSIVE. While link costs decrease, MDVA essentially operates like DBF, 
because the condition on line 9 always fails wherein lines 17-24 are always executed. 
ProcessDistVectQ operates in such a way that when the node is in a PASSIVE state, 
20 the condition D) - FD) - KD) = min\p) k +l i k \ke N 1 } always holds as can be seen from 

lines 8 and 23. However, if the distance to a destination increases either because the 
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cost of an adjacent link changes or a message is received from a neighbor, the 
condition on line 9 succeeds and the node engages in a diffusing computation. This is 
accomplished by sending query messages to all the neighbors with the best distance 
through the subset of neighbors £j , such as SD) , and waiting for the neighbors to reply 

5 (lines 14-15). The node is said to be in an ACTIVE state when it is waiting for the 
replies. If the increase in distance is due to a query from a successor, the neighbor is 
added to QS) so that a reply can be given when the node transits to a PASSIVE state. 

When all replies are received, the node can be sure that the neighbors have the 
0 distances that the node reported and are ready to transition to the PASSIVE state. At 
1(1J1 this point, FD) can be increased and new neighbors can be added to S) without 

%M J J 

C violating the LFI conditions. 

m 

■=rs? 

* , If a query message is received from a neighbor which is not in the successor set 

|1 for a node in an ACTIVE state, then a reply is given immediately. However, if the query 
is from a neighbor m in Sj , a test is performed to verify if SDj increased beyond the 



1 5 previously reported distance, (line 28). If it did not increase beyond the limit then a reply 



is sent immediately. However, if SD) increased, the query is blocked by adding m to 
QS) and no reply is given. The replies to neighbors in QS) are deferred until that time 
when the node is ready to transition to the PASSIVE state. After receiving all replies the 
ACTIVE phase can either end or continue. If the distance Dj is increased again after 

20 receipt of all replies, the ACTIVE phase will be extended by sending a new set of 
queries, otherwise the ACTIVE phase will terminate. For the case of ACTIVE phase 
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continuation, no replies are issued to the pending queries in QSj. Otherwise, all replies 
are given and the node transits to PASSIVE state satisfying the PASSIVE state 
invariant D) = FD) = RD) = mm{D , Jk +lj t \keN'} . 

2. Verifying Correctness of MDVA 

5 The correctness of MDVA is proven for two scenarios: (1 ) subject to link cost 

decreases only, and (2) subject to some link cost increases as a result of increasing link 
distances. MDVA operates in a similar manner to DBF when link costs are only subject 
to decreases and the same proofs utilized for DBF apply. To state this formally, 

X assume that the network is stable preceding a time t , wherein all nodes have obtained 

1 Cry correct distances, and then at time t , the costs of a portion of the links decrease. Since 

W 

J the distances in the tables are such that D)(t) > D l j (t) , within some finite time t\ t < 

50 

L t' < oo, and D)(t') = D l At) . The distinction between D) and D\ should be noted, as 

% D) is the correct distance while D) is just a local variable i and is an estimate of DL 

ps 

^ It will be appreciated that by using the present routing protocol that D) must eventually 

15 equal Dj, barring continuous changes to D 1 -. 

Subject to some link cost increases, wherein distances between a portion of the 
source-destination pairs increase, MDVA and DBF behave differently. In this case, 

D){t) < D l j(t) for some i and j. Both DBF and MDVA first increase Dj to a value 

greater than Dj (t) , after which the distances monotonically decrease until they 
20 converge to the correct distances. MDVA and DBF, however, differ on how they 
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increase the distances. DBF executes the increase step-by-step in small bounded 
increments until D)(t) > D l j(t). Unfortunately, when D)(t) = oo counting-to-infinity is 

encountered. In contrast, MDVA executes diffusing computations to quickly raise Dj so 

that Dj > Dj(t) , after which the functioning is similar to scenario described above, and 

5 the distances converge to the correct values as before. 

In summary, to show that MDVA terminates correctly, it can be shown that (1 ) the 
routing graph SGj is loop-free at every instant; (2) every diffusing computation using 

routing graph SGj completes in finite time; and (3) a finite number of diffusing 

y computations are executed. After performing all diffusing computations the MDVA 

ISJ 

1 d£ algorithm becomes similar to conventional DBF. 
m Theorem 2: For a given destination j , the routing graph SGj constructed by MDVA 
J is loop free at every instant. 

IS Proof; The proof proceeds by illustrating that the LFI conditions are satisfied 

N* during every ACTIVE and PASSIVE phase. Let t n be the time when the n th transition to 
15 ACTIVE state starts at node i for j . The proof is by induction on t„ . At node 

initialization time 0, all distance variables are initialized to infinity and hence FD)(0) < 
D' Jk (0) , and k e N' . The following is valid assuming that LFI conditions hold true up to 
time t„ . 

FD){t) < D l jk {t) t e[0,t n ] (5) 
20 At any time t , from lines 6, 8, 14 and 23 in the pseudocode in FIG. 2, and as a result of 
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££>;(*) > Dj (?) , it follows that: 

FD)(t) < RDj (t) (6) 
and therefore, for t n _ { and t n , we arrive at: 

FD'it^) < RD){t n _ x ) (7) 
5 FDj (t n ) < RD){t n ) (8) 

Let queries be sent at t n , the start time of the n th ACTIVE phase, to be received at a 
particular neighbor k at t' > t n . From Eq. 6 and from the fact that if any update 
U messages have been sent between t n _ x and t 0 , they are non-increasing, whereby it 
| follows that: 

1Q| FD)(t) < D) k {t) te[t n ,t'] (9) 

i F" 

f 1 The variable t" is used to represent the time when all replies are received and the 
g ACTIVE phase ends. During the ACTIVE phase the value of FD) remains unchanged 

i y 

% and no new RD) is reported during this period (line 27-31 ), while during the PASSIVE 

phase only decreasing values of ££>j are reported. The following may then be derived 
1 5 from Eq. 8: 

FD){t) < D) k (t) te[t',t"] (10) 

Irrespective of whether the node transitions to the PASSIVE state or continues in the 
ACTIVE phase, at time t" the following is known from Eq. 6: 

FD)(t") < RD^t") (11) 
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In the case that the ACTIVE phase finally terminates, we arrive at FDj (t) < D l jk (t) for 
t € [t n , t"]. In the PASSIVE state, RD) is can only decrease until the next ACTIVE 
phase at t n+x . Therefore, the LFI conditions are satisfied in the interval [t n , t n+ J. 
Alternatively, if the ACTIVE state continues then new queries are sent at t" . Assuming 
5 that all replies for these queries are received at t'" , and from a similar argument as 
above, it follows that FD)(t) < D) k {t) for t e [t n , t'"]. It will be appreciated, therefore, 

that irrespective of the duration of the ACTIVE phase the invariant holds between the 
times [t n , t n+l ]. As a consequence of which, by induction the LFI conditions hold at all 

§} times. It follows from Theorem 1 that routing graph SGj is loop-free at all times. 

1Cy Lemma 1: Every ACTIVE phase is subject to a finite duration. 

~P Proof: An ACTIVE phase may never end due to either "deadlock" or "livelock". 

Us 

J It will be recognized that a node transitioning to the ACTIVE state, with respect to a 

Q 

given destination, will transmit queries. If the transition occurs as a result of a query 

yQ 

q from a successor, the node defers the reply to this query until it receives the replies to 
15 its own queries. An issue of "circular" waits arises as a consequence of nodes awaiting 
replies to their own queries before replying to a query from a neighbor. It should be 
recognized that "circular" waits can lead to deadlock conditions. However, in the 
present invention "circular" waits are prevented for the following reasons. Firstly, a 
node in the passive state immediately replies to a query from a predecessor (lines 19). 
20 If the query is from a successor that potentially increases SDj , and the node is ACTIVE, 
the query is held until the ACTIVE phase ends (line 29). As a result of the routing graph 
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SGj being loop-free at every instant, as illustrated by the proof to Theorem 2, a 

deadlock condition cannot occur. Thus a node issuing queries to its neighbors will 
eventually receive all the replies and transition to the PASSIVE state. 

A livelock is a situation in which a node endlessly has continuous back-to-back 

5 ACTIVE phases without ever being able to reply to the pending queries from its 

successors. It will be appreciated that a livelock also is not possible within the present 
system for the following reasons. An ACTIVE phase transition occurs either because of 
a query from a successor or a link-cost increase of an adjacent link. A query from a 

u, successor is blocked if it increases best distance SDj . Since links can change only a 

1 13 finite number of times and a finite number of neighbors exist for each node from which 

BJ 

W the node can receive queries, the node can only enter a finite number of back-to-back 
| active phases. A node eventually sends all pending replies and enters the PASSIVE 
y, state, wherein livelock is not possible. 

ry Lemma 2: A node can have only a finite number of ACTIVE phases. 

1 S3 Proof: It is assumed for the sake of contradiction that a node does exist which 

proceeds through an infinite number of PASSIVE to ACTIVE transitions. An active 
phase transition occurs either because of a query from a successor or a link-cost 
increase of an adjacent link. The infinite PASSIVE-ACTIVE phase transitions must be 
triggered by an infinite number of queries from a neighbor, because link costs can 

20 change only a finite number times. Let that neighbor be represented by node k . Now, 
by the same argument, node k is sending infinite queries because it is receiving infinite 
queries. However, this argument cannot be continued indefinitely because there are 
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only finite number of nodes in the network. Since the reply to the neighbor in the 
successor set causing the phase transition is blocked, and the routing graphs are loop- 
free at every instant (Theorem 2), there must exist a node that transitions to the ACTIVE 
state only because of adjacent link cost changes. This implies a link changes cost an 
infinite number of times which is a contradiction of the assumption, which proves that a 
node cannot have infinite ACTIVE phases. 

Theorem 3: After a finite sequence of link-cost changes in the network, the 
distances £>j converge to the final correct values Dj. 

Proof: Assume at time 0 that every node has correct values for all link 

distances. In other words, D) (0) = Dj. (0) . Assume a finite number of link cost 

changes, link failures and link recoveries occurring in the network between time 0 and 
time t c , and after time t c that no additional changes occur. It must be shown that at 
some time t f , such that t c < t f < oo, wherein all nodes converge to the correct 

distances given by Dj(t f ) = &j(t e ) = ^/('/)- 

From Lemma 1 and 2, it follows that all nodes, within a finite time after the last 
link change will transition to the PASSIVE state and remain in PASSIVE state 
thereafter. Therefore, let t' be the time when the last ACTIVE phase ends in the 
network, wherein the following are to be proven. 

1. D l j(t') > Dj(t c ) for every i and j. 

2. In the time period between time t' and time t f , every distance D) 
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monotonically decreases and eventually converges at time t f to the correct distances 
D){t c ). Wherein D){t f ) = D){t c ). 

Proof, Part 1: Assume towards a contradiction that £>;(/') < 2)j(f c ). Let = 
W) + D W)) for some * e K c Assume 0*(f) < £>* fc) , and that * has 
5 only one element. Because Z>j(; c ) = l[[t c ) + D k j(t e ) we have + D) k (f) < 
l l k (t c ) + D){t') from which we can infer that either l[{t') < l[{t c )or D) k {t') < D)(t') or 
both. If (/') < (/ c ) , it implies that the link cost of (i, k) is not yet increased to 

1 5 (*c) via a link-cost change event. When it does, the condition on line 9 becomes true 

W 

j! and an ACTIVE state transition is triggered, and all ACTIVE phases have not 

1(W terminated. Similarly, if D l jk (t') < D] (t') , then messages are in-transit that when 

P processed by node i would trigger a PASSIVE-to-ACTIVE transition. Thus, the 

g ACTIVE phases have not ended, which contradicts the original erroneous assumption. 
Therefore, when ACTIVE phases end D)(t') > Dj (t c ). When K has more than one 

element, each element will be sequentially removed from the successor set without 
1 5 triggering the ACTIVE transition until the last element, at which time the ACTIVE state 
transition finally occurs. 

Proof Part 2: After every node becomes PASSIVE at time t' , all the messages 
in-transit can only decrease the distances; otherwise, that would result in a transition to 
an ACTIVE state. At this stage MDVA works essentially like DBF and the same proof of 
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DBF applies here. Each time a distance is decreased, the new distance is reported. 
The distances will eventually converge, because distances cannot decrease forever and 
are bounded on the lower end by D*j(t c ) . 

3. Evaluating the Performance of MDVA 

The storage complexity is determined by the amount of table space needed by 
any given node. Each one of the N* neighbor tables and the main distance table has 
size of the order 0(|jv'"||jv |). The storage complexity is, therefore, of the order o[\n |). 
The computation complexity is the time taken to process a distance vector and it is easy 
to see that processDistVectorQ requires execution time given by . The time 

complexity is the time it takes for the network to converge after a set of link-cost 
changes occur within the network. The communication complexity is the amount of 
message overhead required for propagating a set of link-cost changes. In a dynamic 
environment, the timing and range of link-cost changes occur in complex patterns and is 
often determined by the nature of the traffic on the network. Thus, obtaining 
expressions for time complexity and communication complexity in closed form is not 
possible, and only approximations are provided for the case in which communication is 
synchronous throughout the network. 

Accordingly, simulations are utilized to compare the worst case performance, in 
terms of control overhead and convergence times, of MDVA with those of DBF and 
MPATH. The purpose of these simulations is to yield qualitative explanations for the 
behavior and performance of MDVA. The reason for choosing DBF as a benchmark is 
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that it does not use diffusing computations and yet is based on vectors of distances. 
The reason for choosing MPATH is that it has been shown to be very efficient, in terms 
of communication overhead and convergence times, compared against prior algorithms 
based on link-state information and distance information, such topology broadcast, 
5 DASM, LVA, ALP. Thus DBF and MPATH represent two ends of the performance 
spectrum. 

MDVA achieves loop-freedom through diffusing computations that, in some 
cases, may span the whole network. In contrast, MPATH uses only neighbor-to- 
u neighbor synchronization. It is interesting to see how convergence times are effected 
1(0 by the synchronization mechanisms. Also, it is not obvious how the control message 
W overheads of MDVA and MPATH compare. 

5 The performance metrics used for comparison are the control message overhead 

'y, and the convergence times. It is assumed that the computation times are negligible in 
py relation to the communication times. The simulator utilized was an event-driven real- 
1 H time simulator called CPT - Simulations are performed on the CAIRN and MCI topology 
shown in FIG. 3 and FIG. 4 respectively. The bandwidth and propagation delays of 
each link are given in parenthesis next to the topology. In backbone networks the links 
and nodes are highly reliable and change status much less frequently than link costs 
which are a function of the traffic on the link. This is particularly true when near-optimal 
20 delay routing is utilized, in which the link costs are periodically measured and reported. 
For these reasons, the algorithms are compared when multiple link-cost changes occur. 
Link costs are chosen randomly within a range and link-cost change events are 
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triggered, at which time the algorithms are allowed to converge. The worst case 
message overhead and convergence times are shown in Table 2 and Table 3 
respectively. MDVA provides a performance increase over DBF by virtue of the 
utilization of diffusing computations for increasing distances. MPATH was found to 
achieve higher performance than MDVA in the majority of instances, although, at times 
MDVA outperformed MPATH as can be seen for MC\(0.1 mS, 10Mb), which generally 
occurs when link-cost changes are largely link decreases as distance- vector algorithms 
are known to converge rapidly when link-costs decrease. 

Accordingly, it will be seen that this invention presents a new distributed 
distance-vector routing algorithm which provides multiple next-hop choices for each 
destination wherein the routing graphs implied by the multiple next-hop choices are 
always loop-free. The present invention utilizes a set of loop-free invariant conditions 
that ensure correct termination of the algorithm and eliminate counting-to-infinity 
problems. The multiple successors that MDVA makes available at each node can be 
used for traffic load-balancing. It has been shown utilizing other known algorithms, such 
as MPDA, that loop-free multiple paths are necessary in order to minimize the delays 
encountered within the network. It will be appreciated, therefore, that MDVA can be 
utilized as an alternative to MPDA to approximate minimum-delay routing in networks. 

Although the description above contains many specificities, these should not be 
construed as limiting the scope of the invention but as merely providing illustrations of 
some of the presently preferred embodiments of this invention. Therefore, it will be 
appreciated that the scope of the present invention fully encompasses other 
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embodiments which may become obvious to those skilled in the art, and that the scope 
of the present invention is accordingly to be limited by nothing other than the appended 
claims, in which reference to an element in the singular is not intended to mean "one 
and only one" unless explicitly so stated, but rather "one or more." All structural, 
chemical, and functional equivalents to the elements of the above-described preferred 
embodiment that are known to those of ordinary skill in the art are expressly 
incorporated herein by reference and are intended to be encompassed by the present 
claims. Moreover, it is not necessary for a device or method to address each and every 
problem sought to be solved by the present invention, for it to be encompassed by the 
1£| present claims. Furthermore, no element, component, or method step in the present 
disclosure is intended to be dedicated to the public regardless of whether the element, 
component, or method step is explicitly recited in the claims. No claim element herein is 
to be construed under the provisions of 35 U.S.C. 112, sixth paragraph, unless the 
element is expressly recited using the phrase "means for." 
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Table 1 



Reference for Notations 



N 


Set of nodes in the network 


N* 


Set of neighbors for node i 


Sj 


Subset of N 1 that node i forwards packets of destination j 




Routing graph implied by the successor sets of destination j 


D j 


Distance of node i to node j as known to node i 




Cost of link Cz, k) 




Distance of node k to j as reported to node i by node k 


FD) 


Feasible distance is an estimate of D) 


RDj 


Distance to j as reported by node / to its neighbors 


SD) 


Best distance to j through S) 


QS) 


Set of neighbors that are awaiting replies ; 


G(t) 


An overview of the network at time / 


m 


Distance of node / to node j in G(t) 


<« 


Cost of link (i, k)'m G(t) 
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Table 2 
Overhead Loading 





DBF 


MDVA 


MPATH 


Topology and conditions 


Message Load (bits) 


MCI (10mS, 10Mb) 


62568 


52352 


32408 


MCI (0.1 mS, 10Mb) 


78624 


52840 


32408 


CAIRN (10mS, 10Mb) 


39648 


14056 


6176 


CAIRN (0.1 mS, 10Mb) 


37208 


12992 


5640 
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Table 3 
Convergence Times 





DBF 


MDVA 


MPATH 


Topology and conditions 


Conversion Time in milliseconds (mS) 


MCI(10mS, 10Mb) 


330.51 


250.46 


190.72 


MCI (0.1 mS, 10Mb) 


4.36 


2.51 


2.62 


CAIRN (10mS, 10Mb) 


470.61 


170.31 


150.32 


CAIRN (0.1 mS, 10Mb) 


4.07 


2.14 


1.82 
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